QBSUM: A large-scale query-based document summarization dataset from real-world applications
نویسندگان
چکیده
Query-based document summarization aims to extract or generate a summary of which directly answers is relevant the search query. It an important technique that can be beneficial variety applications such as engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based are short in numbers existing also limited both scale quality. Moreover, best our knowledge, there no publicly available dataset Chinese summarization. In this paper, we present QBSUM, high-quality large-scale consisting 49,000+ data samples task We propose multiple unsupervised supervised solutions demonstrate their high-speed inference superior performance via offline experiments online A/B tests. The QBSUM released order facilitate future advancement research field.
منابع مشابه
RDF Keyword-based Query Technology Meets a Real-World Dataset
This paper presents the results of an industrial project, conducted by the TecGraf Institute and Petrobras (the Brazilian Petroleum Company), to develop a tool to facilitate access to a large database, with hydrocarbon exploration data, by combining RDF technology with keyword search. The tool features an algorithm to translate a keyword query into a SPARQL query such that each result of the SP...
متن کاملQuery-Based Summarization Based on Document Graphs
Text summarization is an important problem, which has numerous applications. This problem has been extensively studied and many approaches have been proposed in the literature for its solution. One of the most challenging problems in the field of text summarization is generating a user-focused summary based on a query. In this paper, we investigate a new approach that tackles this problem and p...
متن کاملEvaluation Challenges in Large-Scale Document Summarization
We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary...
متن کاملCLASSY Query-Based Multi-Document Summarization
Our summarizer is based on an HMM (Hidden Markov Model) for sentence selection within a document and a pivoted QR algorithm to generate a multi-document summary. Each year, since we began participating in DUC in 2001, we have modified the features used by the HMM and have added linguistic capabilities in order to improve the summaries we generate. Our system, called “CLASSY” (Clustering, Lingui...
متن کاملMultiSum: Query-Based Multi-Document Summarization
This paper describes a generic, opendomain multi-document summarisation system which combines new and existing techniques in a novel way. The system is capable of automatically identifying query-related online documents and compiling a report from the most useful sources, whilst presenting the result in such a way as to make it easy for the researcher to look up the information in its original ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Speech & Language
سال: 2021
ISSN: ['1095-8363', '0885-2308']
DOI: https://doi.org/10.1016/j.csl.2020.101166